Towards a MapReduce Application Performance Model

نویسندگان

  • Jared Gray
  • Thomas C. Bressoud
چکیده

In the modern age, our ability to generate large data sets far outpaces our capacity for analyzing them. Google’s proposed solution to this fundamental problem – the MapReduce paradigm and runtime system – has recently gained traction in the scientific and “big data” industries. However, the performance characteristics of MapReduce are not well known. This paper builds on the e↵orts of prior research to more accurately characterize and model the performance of MapReduce applications on large-scale distributed systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Model-driven Approach for Price/Performance Tradeoffs in Cloud-based MapReduce Application Deployment

This paper describes preliminary work in developing a modeldriven approach to conducting price/performance tradeo s for Cloudbased MapReduce application deployment. The need for this work stems from the signi cant variability in both the MapReduce application characteristics and price/performance characteristics of the underlying cloud platform. Our approach involves a model-based machine learn...

متن کامل

Towards Energy Efficient MapReduce

Energy considerations are important for Internet datacenters operators, and MapReduce is a common Internet datacenter application. In this work, we use the energy efficiency of MapReduce as a new perspective for increasing Internet datacenter productivity. We offer a framework to analyze software energy efficiency in general, and MapReduce energy efficiency in particular. We characterize the pe...

متن کامل

Towards an Ontology-Based Semantic Approach to Tuning Parameters to Improve Hadoop Application Performance

Hadoop MapReduce assists companies and researchers to deal with processing large volumes of data. Hadoop has a lot of configuration parameters that must be tuned in order to obtain a better application performance. However, the best tuning of the parameters is not easily obtained by inexperienced users. Therefore, it is necessary to create environments that promote and motivate information shar...

متن کامل

Towards Control of MapReduce Performance and Availability

MapReduce is a popular programming model for distributed data processing and Big Data applications. Extensive research has been conducted either to improve the dependability or to increase performance of MapReduce, ranging from adaptive and on-demand fault-tolerance solutions, adaptive task scheduling techniques to optimized job execution mechanisms. This paper investigates a novel solution tha...

متن کامل

Using Realistic Simulation to Identify I/O Bottlenecks in MapReduce Setups

The exponentially growing data demands of modern enterprise and scientific applications poses critical challenges in sustaining the applications at scale. The MapReduce [1] programming model has served as the key enabler for executing resource-intensive applications over huge datasets. However, its configuration design-space has not been studied in detail. This is a complex problem as a typical...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014